Penalized Regressions

Regression problems with many potential candidate predictor variables occur in a wide variety of scientific fields and business applications. These problems require to perform statistical model selection to find an optimal model, one that is as simple as possible while still providing good predictive performance.

In the last decade, the higher prediction accuracy and computational efficiency of penalized regression methods have made them an attractive alternative to traditional selection methods.

Unlike subset selection methods, penalized regression methods do not explicitly select the variables. Instead they minimize the loss function by using a penalty on the size of the regression coefficients. This penalty causes the regression coefficients to shrink toward zero. This is why penalized regression methods are also known as shrinkage or regularization methods. If the shrinkage is large enough, some regression coefficients are set to zero exactly.

Ridge Regression

Also known as Ridge Regression or Tikhonov regularization, the model solves a regression model where the regularization is given by the l2-norm. Lambda is a penalty term : the higher the values of lambda, the bigger is the penalty and therefore the magnitude of coefficients is reduced. Ridge Regression shrinks the parameters and therefore is used to prevent multicollinearity, and reduces the model complexity by coefficient shrinkage.

The penalty function is based on the so-called L2 norm which corresponds to the Euclidean distance. The ridge regression is thus to minimize the following cost function:

Lasso Regression

Lasso regression performs l1-norm regularization, which adds penalty equal to the absolute value of the magnitude of coefficients. This type of regularization can result in sparse models with few coefficients. Larger penalties result in coefficient values closer to zero, which is the ideal for producing simpler models. Some coefficients can even become zero and eliminated from the model.

The Lasso regression is thus to minimize the following cost function:

Adaptive Lasso Regression

This estimator was proposed initially by Zou and the idea behind is pretty straightforward : add some weights that corrects the biais of Lasso. Indeed, if a variable is important, it should have a small weight. This way is lightly penalized and remains in the model. If it is not important, by using a large weight, we ensure that we get ridof it and send it to 0.

The Adaptive Lasso regression is thus to minimize the following cost function:

Elastic Net Regression

Elastic net linear regression uses the penalties from both the lasso and ridge techniques to regularize regression models. In the procedure for finding the elastic net method’s estimator, two stages involve both the lasso and regression techniques. It first finds the ridge regression coefficients and the conducts the second step by using a lasso sort of shrinkage of the coefficients. To correct the effect of double shrinkage, the coefficients are rescaled.

The Elastic Net Lasso regression is thus to minimize the following cost function: